Advanced methods for knowledge injection in large language models
Annotation
Transformer-based language models have revolutionized Natural Language Processing tasks, with advancements in language modeling techniques. Current transformer architectures utilize attention mechanisms to model text dependencies effectively. Studies have shown that these models embed syntactic structures and knowledge, explaining their performance in tasks involving syntactic and semantic elements. However, transformer-based models are prone to hallucination where incorporated knowledge is not utilized effectively. To address this, methods are emerging to mitigate hallucination and integrate external knowledge sources like knowledge graphs (e.g., Freebase, WordNet, ConceptNet, ATOMIC). Knowledge graphs represent real-world knowledge through entities and relationships offering a potential injection point to enhance model performance in inference tasks. Various injection approaches, including input, architectural, and output injections, aim to incorporate knowledge from graphs into transformer models. Input injections modify data preprocessing, architectural injections add layers for knowledge integration, and output injections adjust error functions to correct knowledge incorporation during training. Despite ongoing research, a universal solution to hallucination remains elusive, and a standardized benchmark for comparing injection methods is lacking. This study investigates knowledge graphs as one of the methods to mitigate hallucination and their possible integration into Large Language Models. Comparative experiments across General Language Understanding Evaluation benchmark tasks demonstrated that ERNIE 3.0 and XLNet outperform other injection methods with the average scores of 91.1 % and 90.1 %.
Keywords
Постоянный URL
Articles in current issue
- Organic-inorganic light-absorbing composites for near infrared part of spectrum
- Study of pyroelectric effect and creation of modified design of phase modulator based on lithium niobate
- Contrast change of the test object image in single-pixel and focal-plane array imaging through a scattering medium
- Synthesis of adaptive observer for nonlinear nonstationary systems
- Automation of search for optimal values of the ethylene oligomerization process parameters
- Electroluminescence of new coordination compounds of europium ions with β-diketones, acetic and butyric acids
- Method for generating multimedia files for the tasks of facial biometrics and its applications
- Advanced methods for knowledge injection in large language models
- Predicting gene-disease associations using a heterogeneous graph neural network
- Computer simulation of the interaction between a shock wave and a wall shielded by an inhomogeneous gas suspension layer
- Flexible and tractable modeling of multivariate data using composite Bayesian networks
- Method for obtaining two-component composite materials with a given thermal conductivity
- Computer simulation of heat and mass transfer processes during water vapor condensation from natural gas combustion products on smooth cylindrical tubes
- Instability of a rectangular CCCC-nanoplate
- Using genetic algorithms to solve the problem of finding the optimal composition of the reaction mixture
- Configurable combustion models of combustion chamber of microturbine engine with possibility of connecting various physico-chemical processes
- Multilevel splitting for rare events estimation in permutation tests
- Method of muscle tissue segmentation in computed tomography images based on preprocessed three-channel images
- Model of adsorption on epitaxial graphene: analytical results